Typical Depth of a Digital Search Tree built on a general source
نویسندگان
چکیده
The digital search tree (dst) plays a central role in compression algorithms, of Lempel-Ziv type. This important structure can be viewed as a mixing of a digital structure (the trie) with a binary search tree. Its probabilistic analysis is thus involved, even in the case when the text is produced by a simple source (a memoryless source, or a Markov chain). After the seminal paper of Flajolet and Sedgewick (1986) [11] which deals with the memoryless unbiased case, many papers, due to Drmota, Jacquet, Louchard, Prodinger, Szpankowski, Tang, published between 1990 and 2005, dealt with general memoryless sources or Markov chains, and perform the analysis of the main parameters of dst’s–namely, internal path length, profile, typical depth– (see for instance [7, 15, 14]). Here, we are interested in a more realistic analysis, when the words are emitted by a general source, where the emission of symbols may depend on the whole previous history. There exist previous analyses of text algorithms or digital structures that have been performed for general sources, for instance for tries ([3, 2]), or for basic sorting and searching algorithms ([22, 4]). However, the case of digital search trees has not yet been considered, and this is the main subject of the paper. The idea of this study is due to Philippe Flajolet and the first steps of the work were performed with him, during the end of 2010.
منابع مشابه
Probabilistic analysis of the asymmetric digital search trees
In this paper, by applying three functional operators the previous results on the (Poisson) variance of the external profile in digital search trees will be improved. We study the profile built over $n$ binary strings generated by a memoryless source with unequal probabilities of symbols and use a combinatorial approach for studying the Poissonized variance, since the probability distribution o...
متن کاملCompact Suffix Trees Resemble PATRICIA Tries: Limiting Distribution of the Depth
Suffix trees are the most frequently used data structures in algorithms on words. In this paper, we consider the depth of a compact suffix tree, also known as the PAT tree, under some simple probabilistic assumptions. For a biased memoryless source, we prove that the limiting distribution for the depth in a PAT tree is the same as the limiting distribution for the depth in a PATRICIA trie, even...
متن کاملAverage Profile and Limiting Distribution for a Phrasesize In
Consider the parsing algorithm due to Lempel and Ziv that partitions a sequence of length n into variable phrases (blocks) such that a new block is the shortest substring not seen in the past as a phrase. In practice the following parameters are of interest: number of phrases, the size of a phrase, the number of phrases of given size, and so forth. In this paper, we focus on the size of a rando...
متن کاملThe expected profile of digital search trees
A digital search tree (DST) is a fundamental data structure on words that finds various applications from the popular Lempel-Ziv’78 data compression scheme to distributed hash tables. The profile of a DST measures the number of nodes at the same distance from the root; it depends on the number of stored strings and the distance from the root. Most parameters of DST (e.g., depth, height, fillup)...
متن کاملThe Expected Profile of Digital Search Trees ∗ March 24 , 2011
A digital search tree (DST) is a fundamental data structure on words that finds various applications from the popular Lempel-Ziv’78 data compression scheme to distributed hash tables. The profile of a DST measures the number of nodes at the same distance from the root; it depends on the number of stored strings and the distance from the root. Most parameters of DST (e.g., depth, height, fillup)...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014